Skip to content

feat: add partial load rule types#19374

Merged
clintropolis merged 10 commits intoapache:masterfrom
clintropolis:partial-load-rules
Apr 29, 2026
Merged

feat: add partial load rule types#19374
clintropolis merged 10 commits intoapache:masterfrom
clintropolis:partial-load-rules

Conversation

@clintropolis
Copy link
Copy Markdown
Member

Description

Adds a new family of retention rules, loadPartialByPeriod, loadPartialByInterval, loadPartialForever, laying the groundwork for partial loading of V10 segment projections on historicals. This PR includes the rule classes, matcher abstraction, and cascade semantics only; the coordinator plumbing, wire format, and historical-side partial load path are deferred to follow-up work.

changes:

  • adds PartialLoadRule abstract class to capture load rules that should partially load a segment on some tier
  • adds IntervalPartialLoadRule, ForeverPartialLoadRule, and PeriodPartialLoadRule` implementations to mirror non-partial load rules
  • adds PartialLoadMatcher interface to match and select what to partial load
  • adds ExactProjectionPartialLoadMatcher and WildcardProjectionPartialLoadMatcher to do partial loading of projections
  • adds CannotMatchBehavior enum to describe behavior of PartialLoadRule when PartialLoadMatcher is unable to match a segment
  • since partial loading is not available yet, partial rules function as regular load rules until follow-up work
  • tests

changes:
* adds `PartialLoadRule` abstract class to capture load rules that should partially load a segment on some tier
* adds `IntervalPartialLoadRule, `ForeverPartialLoadRule`, and `PeriodPartialLoadRule` implementations to mirror non-partial load rules
* adds `PartialLoadMatcher` interface to match and select what to partial load
* adds `ExactProjectionPartialLoadMatcher` and `WildcardProjectionPartialLoadMatcher` to do partial loading of projections
* adds `CannotMatchBehavior` enum to describe behavior of `PartialLoadRule` when `PartialLoadMatcher` is unable to match a segment
* since partial loading is not available yet, partial rules function as regular load rules until follow-up work
* tests
this.matcher = matcher;
this.onCannotMatch = Configs.valueOrDefault(onCannotMatch, CannotMatchBehavior.FULL_LOAD);
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Partial rule fall-through is invisible to handoff checks

appliesTo returns false for a partial rule whose matcher cannot match and whose onCannotMatch is FALL_THROUGH. Handoff/readiness code that only asks whether the rule applies to the segment interval can therefore miss that this rule intentionally falls through to a later full-load rule, and can conclude that no load rule covers the segment. A realtime task for a segment without matching projections can then wait forever even though the cascade would load it through the following rule. The rule-selection path used by handoff needs to evaluate the same matcher/fall-through semantics as the coordinator cascade, not just interval applicability.

* version that introduces a new behavior falls back to the constructor's default ({@link #FULL_LOAD}) rather than
* failing to parse the rule.
*/
public enum CannotMatchBehavior
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO it'd be nicer to define the enums like FALL_THROUGH("fallThrough") and have the "fallThrough" be how they serialize in JSON. SImilar to JoinAlgorithm and CloneQueryMode. Most Druid stuff uses thatStyle rather than THIS_ONE.

* configuration that drives the decision and the wire format of their corresponding {@link LoadSpec} wrapper, so the
* rule layer stays scheme-agnostic.
*/
@JsonTypeInfo(use = JsonTypeInfo.Id.NAME, property = "type")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we add a new one of these then the PartialLoadRule will fail to deserialize. Just like CannotMatchBehavior, maybe this should be lenient too?

The way to make things lenient is to add a dummy implementation as a defaultImpl, such as QueryCounterSnapshot for example. Servers will then at least know that they're dealing with an unrecognized PartialLoadMatcher.

return patterns;
}

@JsonProperty
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JsonInclude(NON_EMPTY) would be nice.

"type", WIRE_TYPE,
"delegate", baseLoadSpec,
"projections", resolved,
"ruleFingerprint", fingerprint
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It isn't really a rule fingerprint, it's more of a projection names fingerprint. Maybe just call it fingerprint since projectionNamesFingerprint is a mouthful.

* Base for {@link PartialLoadMatcher} implementations that decide which of a segment's V10 projections to load.
* Subclasses supply the resolution policy via {@link #resolveProjectionNames(DataSegment)}; this base handles
* fingerprint computation and wraps the result into the {@code partialProjection} load-spec wire form consumed
* by the historical-side {@code PartialProjectionLoadSpec} (which does not exist yet, supplied in future work).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO best not to mention things that don't exist yet. Instead, the next PR should update this text. It will need to update it anyway to remove the "supplied in future work" comment.

*/
public abstract class ProjectionPartialLoadMatcher implements PartialLoadMatcher
{
static final String WIRE_TYPE = "partialProjection";
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LOAD_SPEC_TYPE? Sounds clearer.

{
final Hasher hasher = Hashing.sha256().newHasher();
for (String name : sortedDedupedNames) {
hasher.putString(name, StandardCharsets.UTF_8);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The javadoc for putString says that putUnencodedChars is preferred.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bump

Copy link
Copy Markdown
Member Author

@clintropolis clintropolis Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I looked into this but didn't change it yet because i sort of forgot about it. FWIW we are using putString in a lot more places in code than putUnecodedChars; the javadocs do say that it is fine so long as only Java is the language involved, so probably fine to change, especially since i have a version is included in the fingerprint so i guess we could just bump version if we ever want to support non-java processes being involved in this stuff in any manner...

private static boolean matchesAny(String name, List<String> patterns)
{
for (String pattern : patterns) {
if (FilenameUtils.wildcardMatch(name, pattern)) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this treat \*? I wonder if it escapes the * or if it looks for a literal \ followed by anything.

It's probaly not likely that a projection name will include * or ?, but it would be nice to support escaping anyway, in case they do.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, it did not handle escapes... had claude help generate a version that supports escaping which seems to not add too much complexity

@Override
public boolean appliesTo(Interval interval, DateTime referenceTimestamp)
{
return Rules.eligibleForLoad(period, interval, referenceTimestamp, includeFuture);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if anything weird will happen because appliesTo(segment.getInterval(), refTs) doesn't always match appliesTo(segment, refTs) like it does with the other rules. This overload is used in two places:

  • DataSourcesResource#isHandOffComplete, where it's used to short-circuit handoff checking for segments that no load rule applies to. I suppose the risk here is that we're in a situation where really no rule matches (because the segment doesn't have any desired projections for a partial-rule, and there's no other rule to fall back to), but the handoff checker thinks the partial-rule would match, and so handoff times out. This seems unlikely to happen— probably in a real cluster there would be another rule to fall back on— but is worth a comment somewhere acknowledging it's a risk. Possibly right here.
  • TieredBrokerHostSelector, where it's used to find the rule that will drive which Broker handles a request when druid.router.tierToBrokerMap is set. This is probably fine: in some configurations of load rules it might change which broker gets a query, but probably the behavior is defensible given that the routing isn't projection-aware.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i switched isHandOffComplete to use the metadata snapshot to use the version of appliesTo that takes a segment so that it will work properly.

I didn't make any changes for TieredBrokerHostSelector, since i'm not really sure how to improve that, i guess we can document it when we document the rest of this stuff.

List<String> projections
)
{
return DataSegment.builder()
SegmentId.of(dataSourceName, theInterval, version, partitionNumber)
);
// Segment isn't published in metadata; it will never be handed off.
if (segment == null) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P1] Do not treat a missing metadata snapshot segment as handed off

getRecentDataSourcesSnapshot() may return a snapshot up to the segment poll delay old, so a just-published realtime segment can be absent even though it still needs handoff. Returning true here tells the task the segment will never be handed off and can let it stop waiting before any historical has loaded it. This should return false or force a fresh metadata poll before concluding the segment is not published.

final SegmentId segmentId = SegmentId.of(dataSourceName, theInterval, version, partitionNumber);
DataSegment segment = lookupSegment(segmentsMetadataManager.getRecentDataSourcesSnapshot(), segmentId);
if (segment == null) {
segment = lookupSegment(segmentsMetadataManager.forceUpdateDataSourcesSnapshot(), segmentId);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P2] Avoid blocking handoff checks on every metadata miss

When the recent snapshot does not contain the segment, this endpoint now calls forceUpdateDataSourcesSnapshot() before returning true. In the v2 incremental metadata manager that call waits for the next cache sync with a timeout of 2 * druid.manager.segments.pollDuration, so a genuinely absent or removed segment can make each handoffComplete request block for up to the poll interval before returning the same true result. Realtime handoff polling is synchronous and iterates callbacks sequentially, so one missing segment can delay completion checks for others. Consider avoiding the forced refresh on every miss, using a cheaper targeted check, or bounding this path so absent segments still complete promptly.

Copy link
Copy Markdown
Member Author

@clintropolis clintropolis Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure how often this would happen in practice, we don't currently have easy access direct to metadata store to try to get just a single segment without some plumbing changes i think, and i don't think we should do this anyway because we don't want a bunch of handoff calls hammering the metadata store trying to fetch single segments.

I guess we could return like a 404 or something and have the client retry some number of times in the case of a missing segment if we are worried about this, though that seems like it would have its own complexity since we would need to track retries on that on the client side, so i'd be in favor of not doing that maybe.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to not have to worry about this if partial rules aren't being used. How about:

  • add LoadRule#isIntervalBased that returns true if the two appliesTo methods are equivalent
  • check that here; go down the metadata path if false; use the interval only if true.

Copy link
Copy Markdown
Contributor

@capistrant capistrant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

none of my comments are blocking, will flip to a +1 if you want to pass on them

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

claude observation - told claude to look at the surface area covered by the different tests since it is hard to keep straight just scrolling and it offered a few ideas for further coverage. just food for thought if you want to expand, I don't see it as blocking

  • Mid-string and suffix wildcards. Every match test uses * or ? at the end (user_, p?, session_d). A pattern like daily or user_v2 would catch a class of bugs that the current tests can't
    (e.g. if someone someday "optimizes" the prefix case). Cheap to add, real signal.
  • No-match-returns-null when the pattern simply doesn't hit anything. testExcludeAllMatchedReturnsNull covers the all-excluded path, and testReturnsNullForProjectionAgnosticSegment covers the
    no-projections path. There's no test for "segment has projections, none match patterns" — same observable behavior, but a different code path through resolveProjectionNames.
  • Multiple exclude patterns. Every exclude test uses a single-element excludePatterns list. List.of("user_", "session_temp_") against a segment with names hitting both would pin down that the
    exclude list is OR'd, not AND'd.
  • Case sensitivity. Implicit in the implementation but never asserted. A one-liner — User_* vs user_daily — locks the contract so a future Pattern.CASE_INSENSITIVE change would break a test instead
    of silently shifting behavior.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i told my claude what your claude said and it made more tests 😛


/**
* Selects projections whose names match any of the configured glob patterns, minus any names matching an entry in
* {@code excludePatterns}. Supported glob metacharacters:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any consideration to reserving [ and ] in case of desire for supporting character classes in the future?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't really consider this, originally i was using FilenameUtils.wildcardMatch which is the underlying way that LocalInputSource matches file patterns in its spec (via WildcardFileFilter), which doesn't support those patterns either (and also didn't support escaping which is why have switched to this).

I think my preference would be to just add a new implementation of a PartialLoadMatcher if we need more sophisticated matching in the future, but I am open to discussion if anyone feels otherwise.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that is a fair path forward. no sense over complicating things for some maybe useful future use case when a separate path to supporting it is known.

{
final Hasher hasher = Hashing.sha256().newHasher();
for (String name : sortedDedupedNames) {
hasher.putString(name, StandardCharsets.UTF_8);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bump

Copy link
Copy Markdown
Contributor

@gianm gianm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks ok to me once this is addressed: #19374 (comment)

Copy link
Copy Markdown
Contributor

@capistrant capistrant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@clintropolis clintropolis merged commit 8d526bd into apache:master Apr 29, 2026
38 checks passed
@clintropolis clintropolis deleted the partial-load-rules branch April 29, 2026 23:27
@github-actions github-actions Bot added this to the 38.0.0 milestone Apr 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants